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Abstract 

This report is of a round-table discussion held in Cardiff in September 2009 for Cesagen, a research centre within 
the Genomics Network of the UK's Economic and Social Research Council. The meeting was arranged to explore 
ideas as to the likely future course of human genomics. The achievements of genomics research were reviewed, 
and the likely constraints on the pace of future progress were explored. New knowledge is transforming biology 
and our understanding of evolution and human disease. The difficulties we face now concern the interpretation 
rather than the generation of new sequence data. Our understanding of gene-environment interaction is held back 
by our current primitive tools for measuring environmental factors, and in addition, there may be fundamental 
constraints on what can be known about these complex interactions. 



Introduction 

The advent of 'high throughput' or 'next generation' 
genomic sequencing technologies has raised expecta- 
tions of what laboratory genetics has, and will have, to 
offer to both the clinician and the patient. Our greatly 
increased ability to generate nucleic acid sequence data 
raises the question of how this rapidly accumulating 
mass of new genomic information will be interpreted 
and when this will become feasible in general clinical 
practise. How much sense will have been made of the 
new data within the next 10 to 15 years? What new 
questions will it be possible for us to pose once these 
new technologies are readily available to provide the in- 
formation upon which biologically meaningful answers 
to these questions can be based? The extended lag time 
between generating a basic understanding of the patho- 
genesis of many single gene (Mendelian) disorders and de- 
vising effective remedies has long been acknowledged. In 
the context of complex disorders and the many known 
quantitative (non-disease) traits, will the lag time between 
data collection and its interpretation be shortened? 
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Such questions were addressed at a round-table dis- 
cussion held in Cardiff in September 2009. This was 
arranged as part of the work programme of Cesagen, a 
joint research centre at the Universities of Cardiff 
(Wales) and Lancaster (England) established by the UK's 
Economic and Social Research Council as part of its 
Genomics Network. Cesagen studies the societal impact 
of developments in genetics and genomics; the meeting 
was arranged to explore and discuss ideas as to the likely 
future course of human genomics. Such technologies 
will permit researchers to ask new questions of theoret- 
ical (biological, e.g. evolutionary) significance and prac- 
tical (medical and other) application. 

The starting point for the discussion was the recent 
leap in DNA sequencing capability developed by several 
different commercial enterprises. There was no need to 
dwell upon the specifics of the technologies; rather, the 
point was to address the potential benefits for the 
sciences and clinical medicine of having large volumes 
of genomic sequence data available, effectively without 
major constraints of time or cost and alongside an emer- 
ging capacity to determine the CpG methylation status 
of the corresponding nucleotide sequences, the 'methy- 
lomel and other modifications of chromatin [1]. Several 
short presentations were made during the round-table 
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sessions, but the emphasis of the meeting was on open dis- 
cussion among the 20 participants, who have backgrounds 
in human and evolutionary genetics, clinical medicine, so- 
cial science and phOosophy. The perspectives developed at 
the meeting have been refined 'virtually' since then through 
multiple cycles of e-mail exchanges. 

Experience of exomics 

Useful insights had already been obtained through two 
large studies: the Genetics of Learning Disability 
(GOLD) study with complete X chromosomal 'exome' 
sequences from >200 patients with sex-linked cognitive 
impairment [2] and the description of the full exomes of 
12 selected individuals [3]. Frances Lucy Raymond 
(Cambridge, England) was able to draw a number of 
general lessons from the GOLD study: 

• Sample quality is critical for obtaining reliable DNA 
sequence data; poor sample quality generates 
multiple sequence variants per sample, making it 
difficult to distinguish real sequence from 
experimental artefact; the assumption that poor 
quality samples would not amplify was false. 

• Truncating sequence variants - termed nulls - are 
found in approximately 1% of genes on the X 
chromosome but are nevertheless compatible with 
normal life in the hemizygous male. 

• Missense variants are common, with an average of four 
unique single nucleotide non-synonymous variants per 
family with X-linked intellectual disability. 

• Data access remains a sensitive issue; there is a need 
to publish variants and allele frequencies while 
preserving the anonymity of research participants. 
This would generally entail publishing aggregate 
data and new variants rather than individual-specific 
haplotype information. 

These findings emphasise what is already well known, 
that a disrupted gene need not necessarily lead to a clin- 
ically overt disease state, i.e. many genes are dispensable, 
and even for inactivated essential genes, penetrance will 
often be incomplete [4,5]. The difficulties inherent in this 
analysis with regard to the interpretation of previously un- 
reported variants - mutations that may be of pathogenic 
significance - will be even more substantial when we con- 
sider autosomal loci, of which (almost) everyone will have 
two copies [6]. Additional new approaches to determine 
the functional significance of genomic variation wOl be 
required if any coherent interpretation of the approaching 
deluge of data is to be feasible [7]. 

Epigenetics 

Epigenetic influences together introduce an additional 
layer of complexity over and above the functionality of 



genomic DNA sequences [4]. The inter- relationship of 
DNA sequence and DNA sequence modification has not 
yet been adequately addressed in either theory or experi- 
ment; it should, however, be noted that this relationship 
is likely to be bi-directional with certain polymorphic 
variants influencing methylation status and the methyla- 
tion status in turn influencing mutability. Ros John (Car- 
diff, Wales) introduced this topic and its relationship to 
known imprinting phenomena, referring to both epi- 
demiological and experimental work, viz. records of the 
Dutch famine at the end of World War II [8] and con- 
temporary work with the agouti mouse [9]. The term 
'epigenetics' includes the study of both conventional gene- 
gene (GxG) interactions, long recognised in principle but in 
practise exceedingly difficult to measure in humans, and 
the still under-researched area of 'predictive adaptive 
responses'. The latter has been elaborated theoretically [10], 
triggering very interesting - but so far perhaps more tanta- 
lising than illuminating - laboratory work on the possible 
influence of early life experiences (before or soon after 
birth) on patterns of DNA methylation and subsequent dis- 
ease susceptibility in the face of dietary challenges and 
other environmental circumstances. 

Our understanding of the intricate role of the epigen- 
ome in directing differentiation, communicating cell fate 
and mediating adaptive response will be informed by 
comprehensive surveys of epigenetic marks across space 
(cell type/state) and time (development and ageing), and 
in response to environmental chaUenges (diet, stress). 
To this end, a number of ingenious genome-wide, high- 
throughput technologies have been developed. Some are 
based on the ability to selectively capture methylated 
fragments of the genome using methyl-binding proteins 
and then sequencing these fragments. MeDIP-seq, 
MBD-seq and MethylCap-seq have all been applied suc- 
cessfully, although with differences in selectivity for spe- 
cific sequences [11,12]. A readout of the whole genome 
methylation status can also be obtained by applying a 
bisulfite-sequencing technique, whereby genomic DNA is 
exposed to HSO3, which deaminates only the unmethylated 
(and, hence, unprotected) cytosine residues resulting in a 
change in the DNA sequence. This technique may provide 
greater coverage of the genome [13]. The genome-wide as- 
sessment of histone modifications is a more complex task 
since these modifications are numerous, and activating and 
silencing marks are not necessarily mutually exclusive [14]. 
Chromatin immunoprecipitation followed by high- 
throughput sequencing is the preferred technique for deter- 
mining the genome-wide locations of specific histone modi- 
fications [15,16]. Although there are still issues, in 
particular, the biassed representation of certain sequences, 
computational analyses and further technical refinements 
will undoubtedly increase the sensitivity of these techniques 
and further reduce the costs. 
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One potentially very important aspect of GxG and 
gene-environment (GxE) interactions that may limit 
their importance, or at least our ability to detect them, is 
the small effective population size (Ne in population 
genetic notation) of humans [17]. Another important 
question is the extent to which the assessment of CpG 
methylation can serve as an adequate biomarker for the full 
range of epigenetic influences. A systems biology approach 
to the recognition of biological networks may be one way 
of addressing GxG and GxE interactions in humans [18,19], 
given that the types of breeding experiments pioneered in 
Drosophila species are clearly ruled out in humans, not 
only on ethical grounds but also by the restrictions of time- 
scale available to researchers. The collection of biological 
samples and data on the large Helmholtz cohort in Ger- 
many is intended to address some of these issues. 

Behaviour and psychiatry 

When one turns to consider human behaviour and psy- 
chiatric disorders, rather than the (relatively) simple 
phenotypes associated with human growth or with meta- 
bolic and developmental disorders, additional layers of 
complexity become evident in the GxE interactions. As 
Anita Thapar (Cardiff, Wales) made clear, there is a 
pressing need for good measures of both the environ- 
mental factors and the behavioural phenotypes of poten- 
tial interest. The GxE interactions can be looked at as 
purely statistical phenomena or, more interestingly and 
perhaps more importantly, as phenomena of potential 
biological meaning. Here more than anywhere, and in 
deference to the great David Hume, it is essential to 
avoid the all-too-easy slide from the observation of an 
association into unwarranted assumptions about caus- 
ation. Thus, if one observed an association between atten- 
tion deficit hyperactivity disorder in children and maternal 
cigarette smoking, what causal influences might have been 
at work, - e.g. an environmental effect of smoking in preg- 
nancy or a common genetic basis to both behaviours? 
One opportunity to clarify such questions, although diffi- 
cult to establish, would be longitudinal studies of children 
born following conception by IVF. Some of these ques- 
tions are profound, touching our very humanity; for in- 
stance, how (through what process) does the early 
maltreatment of children result in later antisocial behav- 
iour? How may social interventions be designed effectively 
to interrupt such pathogenetic linkages, for the long-term 
benefit of society as a whole? The Bradford-Hill criteria 
[20] address this problem of causation but have yet to be 
fully tested in the genomic era. 

Improved methods of measuring environmental factors 
and their behavioural correlates are clearly required for 
the full benefits of the progress in genome sequencing to 
be attained; these measurements should be made without 
lapsing into implicitly deterministic assumptions. 



Common disease genetics 

Michael Krawczak (Kiel, Germany) challenged the 
assumptions underlying much 'common disease genetics' 
- especially the limitations of its most influential models. 
He expanded upon three critical underlying assumptions 
made by Reich and Lander [21,22]: 

• The allelic spectrum of common disease genes is 
simple. 

• There was a single, sudden expansion event of the 
human population (at least, the out-of- Africa 
population) from 10,000 members to its modern 
size. 

• The part played by selection in the spread of alleles 
predisposing to common diseases was relatively 
small. 

This model has led to the neglect of new mutants aris- 
ing from existing variants during the turnover of the al- 
lelic spectrum of common disease. Krawczak explained 
why these assumptions are highly implausible and what 
effect this is likely to have had on the current allelic 
spectrum of genetic predisposition to complex disease. 
In particular, as is apparent both when other modelling 
approaches, such as coalescence theory, are used and 
from the data currently accumulating: 

• The allelic spectrum of complex disorders, like that 
of Mendelian disorders, is likely to be diverse. 

• There may be only a weak correlation between the risk 
of disease and the population frequency of risk alleles. 

• The functional effects of predisposing alleles may be 
weakly or even inversely related to the associated 
disease risk (some predisposing alleles may even, 
counter-intuitively and due to stochastic processes, be 
more firequent in unaffected than affected individuals). 

• The mutational load in unaffected individuals may 
be particularly high for common diseases. 

Mapping the 'polygenes' 

Krawczak also discussed the limitations of the available 
strategies for mapping the loci which contribute to the 
common, complex disorders. The power of the affected 
sib pair (ASP) linkage approach has been regarded as 
low [21] in comparison to association studies employing 
the transmission disequilibrium test (TDT), not a case- 
control design, but that view also had shortcomings: 

• The paper assumed knowledge of the risk allele 
when assessing the TDT but not when assessing the 
ASP method. 

• The relative risks invoked were unrealistically high. 

• The association approach was found to be 
advantageous for studies of candidate genes (or 
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regions) but not necessarily for genome-wide 
methods exploiting linkage disequilibrium, such as 
genome-wide association studies (GWAS). 

There are good grounds for using both family studies 
and GWAS to utilise all the available evidence [23]. 
Family-based approaches that could be used with whole- 
genome sequencing (potential scenarios) include: 

• sequencing the index patients and evaluating the 
inheritance pattern of 'interesting' genes or gene 
regions in their families 

• sequencing the parents of the patient, comparing 
the results to population controls, and using TDT 
on 'promising' alleles 

• assessing the co-inheritance pattern of unlinked 
genes in families to derive candidate regions for 
GxG interactions 

• defining and assessing pre-disease or sub-clinical 
disease states in genetically-defined high-risk 
individuals identified in families. 

Andrew O M Wilkie (Oxford, England) mentioned that 
the supposed 'triumphs' of the common disease/common 
variants (CD/CV) model conveniently ignore the fact that 
the relative risks conferred by susceptibility alleles cluster 
much more closely around 1.0 than anyone envisaged two 
decades ago; relative risks above 2.0 are decidedly unusual. 
Hence, the majority of heritability remains unexplained, 
and useful personalised prediction in healthy individuals 
(largely based on family history) has hardly improved over 
this time. 

Additional challenges to our understanding include 
the greater heritability of diseases (e.g. schizophrenia, as 
presented by Nick Craddock of Cardiff, Wales) than 
would be expected given the CD/CV model of disease 
causation and the greater difficulty of interpreting any 
estimates of heritability when the disease phenotype 
results from the joint action of several predisposing 
genes (as Helen M Wallace (Buxton, UK) reminded us). 
Whether these insights into pathogenesis will eventually 
convert to promised improvements in human health 
remains to be seen. 

There was discussion about the possible 'replication' 
of the results of GWAS, led by Mathias Chiano (Steven- 
age, England). Even the question of what exactly 'repli- 
cation' would entail needs to be clarified, as when 
specifying the population(s) from which study partici- 
pants are drawn. If an attempt is made to replicate a 
given study in a different population, the meaning of a 
negative result will be unclear because the relevant gen- 
etic factors may differ between this and the original 
population; there are no easy answers in relation to 
population differences. 



Selection 

Chris Tyler-Smith (Hinxton, England) continued with 
the theme of human history and the contemporary 
traces of past selection. Selection can take many forms 
but the presence of high levels of polymorphism sug- 
gests either an ancient origin for neutral mutations or 
the maintenance of polymorphism by fluctuating or bal- 
ancing selection. Recognising past selection is possible 
by (i) counting offspring, (ii) looking for patterns of 
DNA variation in populations that depart from neutrality 
(such as long haplotypes, skewed allele frequency spec- 
tra, large differences between populations), and (iii) 
functional studies. Recent genomic studies comparing 
populations have drawn upon the second category of 
evidence that detects the net effects of positive selection 
over millennia, but these have mostly been based on 
simple models of selection and are less able to detect 
phenomena such as heterozygote advantage, frequency- 
dependent selection, disruptive selection - selection for 
both homozygotes - or antagonistic but stabilising selec- 
tion with contrary effects at different stages of the life 
cycle in the two sexes or in different environments. 

While there are grounds for thinking that selective 
'sweeps' have been uncommon in recent human evolu- 
tion [24], the role of selection in maintaining substantial 
levels of polymorphism remains less clear. Although a 
biological (natural selection-based) account for many 
genetic observations consistent with selection remains 
elusive, this could be the result of the challenging nature 
of such studies. The persistence of polymorphism with- 
out a clear genomic signature may result from changes 
in the strength and direction of selection. Despite these 
complexities, there are widely accepted examples of 
positive selection, including disease-associated variants 
and those influencing visible traits such as skin colour 
or hair, where sexual selection through gene-culture 
interaction may have been relevant [25,26] . 

Human genetic variation: the 1000 Genomes 
Project 

Results from the 1000 Genomes Project [27] and other 
studies indicate that, despite the low level of genetic 
variation in humans compared with other apes, there 
are still enormous inter-individual differences in gen- 
ome sequence, which can now be identified in a com- 
prehensive way at the level of CNVs, indels and SNPs. 
Notably, there are on average 250 to 300 loss-of- 
function variants such as 'nonsense' or splicing SNPs in 
the genome of the average 'healthy' person, 50 to 100 
previously implicated, disease-associated variants and 
approximately 80 genes that vary in copy number. As 
suggested by the GOLD study (op cit), this greatly 
complicates the clinical interpretation of an individual's 
genome sequence. 
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The first goal of the 1000 Genomes Project has been to 
capture important data about background human genetic 
variation which will be essential in studies of both inher- 
ited and acquired disease. The samples being studied: 

• were initially drawn from urban populations in four 
countries (HapMap Nigerian, Japanese, Chinese, and 
families with European ancestry from the USA) but 
additional samples will be drawn from elsewhere, 
including South Asia 

• were from anonymous individuals with no 
phenotypic information provided (except sex, 
population of origin and that they are adults 
competent to provide consent), but do have consent 
for free web release of the sequence data, and are 
available as cell-lines to researchers 

• because full x 30-depth sequencing to detect 
heterozygous variants in a single individual 
effectively would be too costly, low coverage 
sequencing of samples from many individuals is 
being used to capture the variants shared between 
individuals; the full project aims to detect essentially 
all polymorphic variants (i.e. those at >1% 
frequency) in the geographical areas investigated 
that lie in the accessible part of the genome. 

1000 Genomes pilot studies 

• Samples (179) at depth x 2-4 should identify most 
variants present at >5% - so far, 17.2 million SNPs 
have been found, of which >50% were novel. 

• Two trios sequenced at x 30-60 revealed many 'new' 
{de novo) mutation events (see below) 

• Approximately 1,000 genes were sequenced deeply 
in samples from 700 volunteers. 

The next task: it will be necessary to draw upon large 
population studies to seek evidence of variable mutation 
penetrance. 

Mosaicism and foetal DNA in the maternal 
circulation 

High-throughput sequencing is also opening up the de- 
tection of mosaicism, which is likely to have many bio- 
logical consequences in addition to neoplasia [28]. With 
a read depth of x 30, it is difficult to detect mosaicism at 
or below the level of 5% because of the 1% error in typ- 
ing each nucleotide position. This could have applications 
in the study of cancers, e.g. the evolution of individual 
tumours as reflected in tumour-specific DNA [29], and in 
prenatal diagnosis, where free foetal DNA is currently used 
to look for alleles not present in the mother, such as Rhesus 
or Y chromosome sequences, but is likely soon to be applic- 
able to population screening, especially for trisomy 21 [30- 



32]. Greater read depths will be required for clinically im- 
portant applications reliant upon the detection of low levels 
of mosaicism [33]. 

Phenotypes and taxonomy 

Craddock led a discussion on disease phenotypes and their 
relation to disease taxonomy, especially in relation to psy- 
chiatric disorders. The definition of a phenotype is crucial 
as it may be influenced by numerous factors including 'cul- 
ture' and tradition, established or desired patterns of health 
service utilisation, interacting environmental variables and 
previously accepted models of disease causation. 

There has been a growing awareness that the specific sin- 
gle genes in which mutations cause a few rare disorders 
may also be (more loosely) associated with disease predis- 
positions apparent in GWAS. Thus, rare mutations in the 
CACNAIC gene can result in a multisystem disorder, mani- 
festing as cardiac dysrhythmias, epilepsy, autism, cognitive 
impairment and abnormal physical features, whereas vari- 
ation in the same gene is more weakly associated with bipo- 
lar disease, schizophrenia and unipolar depression, as 
evidenced by GWAS. The benefits of identifying specific 
genes in which variation contributes to disease susceptibil- 
ity include understanding the pathogenesis, establishing a 
diagnosis, devising new therapies and selecting the most 
appropriate treatment for the individual patient. 

Following the course of human genetics research over 
the past 15 years, one may perceive a shift in focus from 
rare to common diseases and from genes of large effect 
in which mutations cause disease to genes of small effect 
in which variants merely modify the risk of disease. 
Mutations in genes of large effect are usually accepted 
unequivocally as important causal factors; the modifying 
genes of lesser effect can, however, also provide insights 
into pathogenesis when several loci in the same develop- 
mental or functional pathway interact to contribute to a 
specific disease. It is also clear that defining the pheno- 
type is, itself, an iterative process, taking the researcher 
back and forth between the phenotype and the genotype. 
The researcher can use dififerent values of a given param- 
eter in order to establish which yields the clearest discrim- 
ination between individuals with and without disease. 

The finding that sequence variation at some of the 
same loci influence both schizophrenia and bipolar dis- 
ease has cast doubt on the validity of the Kraepelinian 
dichotomy of the psychoses into these categories [34,35]. 
The exclusion of 'intermediate' categories of patients 
from much previous research had in effect prevented 
this dogma from being challenged in the past - it became 
a closed 'view of the world' (a prophecy that was both 
dogmatic and self-fulfilling). 

More recently still, it has become clear that new muta- 
tions at many loci contribute very substantially to 
schizophrenia, confirming the indications from studies 
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of de novo CNVs that de novo rare mutations contribute 
importantly to this relatively common disease [36,37] 
and demonstrating that the reservations of Krawczak 
and others about the CD/ C V model were warranted. 

Mutation and selection at single loci 

There is still much to be learned from the detailed study 
of single loci, in addition to genome-wide research that 
does not assume the applicability of the CD/CV model 
of complex disorders. As an example, Wilkie presented 
information from his long term studies of the fibroblast 
growth factor receptor (FGFR) genes. 

As background to the FGFR studies, germline muta- 
tion rates have been estimated to be about 2.5 x 10 
[38] or 1.8 X 10 [39] - equivalent to approximately 120 
nucleotide substitutions per birth. The fitness of amino 
acid substitutions has been estimated [40,41], and purify- 
ing selection is thought to act on 2.5% to 5% of the gen- 
ome (of which 1% will be coding sequence). This will 
lead to 1 to 3 harmful amino acid substitutions per birth, 
plus a similar number from each previous generation. 
Recent data from the 1000 Genomes Project suggest a 
slightly lower rate of new mutations, closer to 10"^ per 
base pair per generation, and that the influence of puri- 
fying selection can be detected through a reduction in 
genetic variation at distances of up to 100 kb from 
genes, implying that few parts of the genome are entirely 
unaffected by such selection. Similarly low rates of mu- 
tation have been found by others [42] . 

Wilkie described how he set out to explain the appar- 
ently very high levels of mutation in specific genes using 
the FGFR genes as a model. The most frequent transi- 
tion mutation in the human germline is the FGFR3 mu- 
tation that causes achondroplasia, but Wilkie chose to 
study the Apert syndrome mutation 755 C > G in 
FGFR2, which is the most common transversion muta- 
tion in humans. The mutations originate exclusively 
from the healthy fathers of the affected individuals, who 
tend to be older than average for the population (pater- 
nal age effect). From the analysis of normal sperm and 
testes, it can be deduced that the positions that mutate 
in FGFR genes are not true mutation hotspots; rather, 
there is positive selection for spermatogonia carrying the 
mutation because the mutation confers gain-of-function 
properties upon the encoded protein [43-45]. Wilkie 
looked for these mutations in an uncommon testicular 
tumour - spermatocytic seminomas - and a comparable 
spectrum of mutations was found here as in congenital 
disorders and in bladder tumours: they all act on the 
Ras/MAPK pathway, which plays a central role in regu- 
lating proliferation and other critical cellular processes. 
The testis may be viewed as a 'bioreactor' for selfish 
mutations that promote the clonal growth of spermato- 
gonia harbouring one of these growth-promoting 



mutations [46]. The effects of each mutation vary with 
the degree to which it is activating: 

1) highly activating mutations: lethal disorders or 
testicular tumours 

2) moderately activating mutations: congenital 
malformation syndromes exhibiting a paternal age 
effect 

3) weakly activating mutations: rare sequence variants 
that may predispose to a wide spectrum of 
disorders, involving for example neurodevelopment 
and cancer predisposition 

It should be noted that the latter category of muta- 
tions will be undetectable by GWAS, and so may con- 
tribute to missing heritability [47]. 

The paternal age effect for schizophrenia may occur 
through a similar mechanism. It should be remembered 
that more than 1% individuals with schizophrenia have a 
de novo CNV, often a rare but recurrent CNV [48], while 
CNVs have been shown to be much more common in 
cases of schizophrenia than in controls (5% of controls, 
15% to 20% of cases) [49] and, interestingly, to overlap 
with the CNVs found in attention deficit hyperactivity 
disorder [50]. However, the frequency of de novo CNVs 
is high enough in controls that such de novo events can- 
not be taken as ipse facto proof of pathogenicity. 

Pharmacogenetics 

Chiano led a discussion on pharmacogenetics and med- 
ical practice. Genetics is just one of the factors that in- 
fluence the safety and efficacy of particular drugs. These 
factors include dosage, 'the environment^ compliance, 
other drugs, diet, age and the co-occurrence of other 
diseases. Efficacy can vary from 80% for COX-2 inhibi- 
tors to 25% for cancer chemotherapy. The abacavir- 
associated hypersensitivity reaction can occur in 5% to 
8%; there is a major susceptibility locus in the HLA re- 
gion [51,52]. Proponents have been disappointed at the 
slow pace of introduction of pharmacogenetic testing 
into clinical practice. Meanwhile, somatic genetic tests 
are proving to be helpful in guiding the treatment of 
some malignancies but have been slower to enter regular 
clinical practice for a wider set of indications despite the 
longstanding recognition of the relevant pharmacoge- 
netic phenomena: additional criteria of utility and cost- 
effectiveness also need to be fulfilled [53]. 

Privacy and consent 

Ruth Chadwick (Cardiff, Wales) led a discussion on 
issues of privacy and consent in genomic research. It is 
becoming impossible for medical researchers to guarantee 
privacy to the research participants they recruit - especially 
with the pressure from funding agencies who insist upon 
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open-access archiving of genomic sequence data, as these 
data inevitably contain potentially identifying information. 
Indeed, it would now be misleading to promise privacy of 
personal genome information to research participants in 
exchange for consent to donate samples. 

There are difficulties at the level of the technology 
(sample collection and storage, and data generation, 
storage and access) and the interpretation of information 
collected. It has been argued that the very concept of 
privacy will have to be renegotiated in the context of the 
rationale(s) for data sharing, especially at international 
level. Indeed, there has been a move from 'traditional' 
notions of informed consent to one of a broad consent 
and then to open consent. One must either strengthen 
the traditional practises of consent - ensuring that parti- 
cipants understand the difficulties of ensuring privacy 
before giving consent (while at the same time pursuing 
the data protection strategies that are possible) - or re- 
think the whole notion of privacy [54]. In clinical prac- 
tice, there may be recurrent difficulties when molecular 
studies of a malignancy, designed to guide therapeutics, 
thereby, also yield unwelcome information about prog- 
nosis or implications for close relatives [55]. In the context 
of research, however, new forms of consent (e.g. broad 
consent taken online) may well be readily accepted [56]. 
One approach is for the researcher to move from guaran- 
teeing (and so protecting) privacy to practising veracity, 
explaining to potential participants that their data will be 
accessed by, and shared with, others; those consenting to 
research have to be open to this [57]. 

Those participating in 'personal genomics' would be 
deluding themselves if they thought that privacy was still 
possible once their genome sequences became accessible 
online [58,59]). This is the age of bioinformatics, surveil- 
lance, Facebook and Twitter. There is inevitably a risk of 
personal consequences for research participants when 
their data are released. These will take different forms 
under different circumstances. Access to health care 
may become more difficult in countries without state 
health care or universal and compulsory health insur- 
ance, as when health care is made available through 
commercial insurance schemes. In addition, if informa- 
tion about other family members becomes available, 
then biological family relationships might be shown to 
differ from the pattern of social relationships presented 
in public (e.g. paternity may have been misattributed). 

One perspective reveals a clash of research cultures: 
molecular scientists (who seek instant open access to all 
data to maximise the scientific - and perhaps commer- 
cial - exploitation of the data) and the clinicians (who 
are often more insistent upon protecting their patients 
and more modest in their assessment of the likely short- 
term benefits to the participants, and other patients, of 
the research itself). 



One key notion is that of the acceptable uses for re- 
search data. Participants tend to trust 'the system' to use 
their data in a 'good' way. The system of research gov- 
ernance is designed (in part) to ensure that such trust is 
warranted - although the system of governance also 
serves other, more institutional purposes. The prospect 
of internet-based marketing corporations using access to 
research data and to electronic health records as an op- 
portunity to market more products seems both manipu- 
lative and cynical. 

The concept of 'consent' is also problematic in ways un- 
related to the concept of privacy. As Raymond explained, 
families participating in research studies could allow their 
genuine altruism to lead them to sign the research consent 
forms because their thoroughly worthy impulses blinded 
them to the potential for problems to arise from participa- 
tion. Such problems might include results of uncertain 
significance, which may remain difficult to interpret for 
many years or whose significance might only be clarified 
through the performance of family studies that could be 
confusing or distressing; even with such family studies, 
the significance of the results may remain opaque. If the 
reasons underlying the importance and relevance of con- 
sent to research were explained, then many families might 
decline participation. 

One response to these possibilities has been the sug- 
gestion that a solution should be found through infor- 
mation technology itself, with the development of a 
system of 'data enclaves! within which researchers can 
utilise data in a predefined analysis but from which they 
cannot export any data about individuals - only aggre- 
gate data and the derived results of analysis. 

Limited clinical utility of genetic association 

Wallace set out her answer to the important question, 
'What are the (biological) limits to the predictive value 
and clinical utility of the 'predict and prevent' strategy?' 
An early approach to this led to the conclusion that en- 
vironmental interventions targeted by genotype might 
significantly reduce the incidence of some diseases [60], 
dependent on the magnitude of gene-environment inter- 
actions (in the statistical sense of their influence on risk 
at a population level). However, there was an error in 
the measure of clinical utility used in this paper, and in 
addition, it took no account of the proportion of the 
population in the high-risk group (thus, erroneously sug- 
gesting that a genetic test which identified the whole 
population as at high risk would have high clinical utility). 
Further, the aetiology of a given disease will place con- 
straints on the potential magnitude of both the genetic 
component of a disease and of any gene-environment 
interaction. Wallace set out her approach to assessing 
clinical utility in the context of genetic testing [61]. 
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Utility, Uge = population impact - gamma, where 
gamma is the fraction of the population classiiied as at high 
genotypic risk, and the population impact is the avoidable 
proportion of disease occurring in this fraction of the popu- 
lation, compared to the population as a whole. 

Utility is larger if gamma is small, and targeting is only 
effective in the presence of GxE interactions. If there is 
no GxE interaction, genotyping performs no better than 
random selection in terms of reducing the incidence of 
the disease in a population. If high-risk individuals have 
less to gain from intervention than the low-risk group, 
then targeting has negative utility. There is a range of 
solutions for any given disease, which can be mapped 
out if data on relative risks from both twin and family 
studies, plus environmental data, is known. Each solu- 
tion depends on the model that is assumed for gene- 
gene and gene-environment interactions. Because of the 
number of alternative models, the sample sizes required to 
evaluate or validate risk predictions may be too high, and it 
could be impossibly difficult to generate good association 
(GWAS) data and integrate it with environmental informa- 
tion [62]. The model also confirms earlier findings that her- 
itabUity estimates can be high even in the complete absence 
of any genetic component to a disease [63] and that gene- 
gene and gene-environment interactions, as well as the 
equal environments assumption, may account for some (or 
even all) of the 'missing heritability' of complex diseases. 

It should be noted that recent modelling of heritability 
and GWAS data suggest that a much greater fraction of 
the heritability of quantitative traits can be accounted 
for by GWAS data than had been found previously, once 
allowance is made for the incomplete linkage disequilib- 
rium between the genotyped SNPs and the 'true' causal 
factors [64-66]. However, such modelling is still unable 
to identify the factors involved and cannot deliver 
improved capacity to predict disease. Furthermore, only 
a small proportion of the calculated heritability has yet 
been identified for any common disease [67] . 

Conclusion 

The space provided by this workshop and the subsequent 
e-mail exchanges has served to clarify our thoughts and 
allowed several modest conclusions to emerge, although 
unanimity was not achieved on all issues. Four points, in 
particular, became progressively clearer during these 
discussions. 

First, the new insights derived from the application of 
high throughput sequencing are likely to transform our 
understanding of human biology, especially in the con- 
text of human genetic disease and evolution. Clinical 
appUcations of human genome sequencing are emerging 
in oncology and reproductive genetics and are beginning 
to impact upon other clinical areas and disciplines. The 
old debates about nature vs. nurture are likely to re- 



emerge in new guises, and great care will be needed in 
case fruitless debate consumes too much energy or mis- 
conceived arguments lead to the inappropriate applica- 
tion of theory in policy and practise across various walks 
of life, most especially psychiatry, education and the law. 

Second, and despite a full acceptance of this first 
point, major problems remain in moving from the rap- 
idly accumulating raw sequence data and catalogues of 
genetic variation to warranted conclusions in their inter- 
pretation. These problems include questions relating to 
the clinical utility of conclusions drawn on the basis of 
genetic association studies that lack a plausible patho- 
genic mechanism underlying the observed association. 
There will inevitably be uncertainties in the face of a 
morass of interacting variables, but some of the difficul- 
ties should be resolved over time - although perhaps on 
a timescale of decades rather than months. In addition, 
our current measures of environmental factors are 
primitive, and our ability to record the variation in such 
factors over a lifetime of many years is still crude. 

Finally, and despite the continuing advances in infor- 
mation technologies, the accumulation of both genomic 
and environmental data may outstrip the capacity of our 
information systems to store and analyse the data [68]. 
These limits to computability may not be merely prac- 
tical in nature - to be overcome once technology has 
moved on - but may represent constraints as to what 
can be known or calculated even in principle, given lim- 
its to global population size, the nonrandom distribution 
of genotypes and the exponentially increasing number of 
statistical comparisons that can be made; how can one 
correct adequately for such astronomical numbers of 
comparisons? Deductive, hypothesis-driven research on 
the basis of such data will be challenging! 

Acquiring the necessary genome sequence data appears 
now to be almost the easy step by comparison with the 
quantification of environmental factors and of human 
phenotypes, the physical limits to data storage and con- 
straints in principle to computability and interpretation. 
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